Learning Morphology: Algorithms for the Identification of the Stem Changes
نویسنده
چکیده
The aim of the current work is to create tools' for the automatic recognition of the Estonian stem changing rules'. The main problem consists in bringing together the ,fi)rmal classification,features available to the computer and classification based on human knowledge. This paper introduces two algorithms. First, in STLearn the supervised inductive learning technique is used to find out the suitable jeatures Jor automatic recognising of the stem changes. Two stem variants" can be bounded by more than one stem change. The second algorithm is created Jor the identifjdng the whole set of rules Jor stem pairs'. Current work is a part of a project based on the open model of language [Viks94] according to which all regular and productive phenomena of the natural language are represented by different types of rules and irregular phenomena are listed in small dictionaries exception lists. This approach gives opportunity to process the regular words not listed in dictionaries new derivatives, loan-words etc. Subsystem of morphology plays the central role in processing of the morphologically complex languages as the Estonian language is. The number of possible stem variants can strongly vary in Estonian: in some inflection types there are no stem variants at all, in some of them a word can have even five different regular stem variants. Current work presents tools ~br creating a formal description of the Estonian stem changing rules, starting from the pair of the stem variants. The Concise Morphological Dictionary of the Estonian (CMD) [Viks92] serves as a bases for current work and contains over 36 000 headwords, each of them has two stem variants on the averages. The principle types of changes are the following: 1. Stem-grade changes. Stem can occur either in a strong or a weak grade; the grades are differentiated first of all by phonetic quantity (2nd or 3rd degree of quantity marked by') that may be accompanied by various sound changes enfblding the medial sounds. For instance members of the stem pair h6ive-h'~ive are distinguished only by the different phonetic quantity; in case of couple aat2e'aal2e the rewriting rule b --+ p is concurrent with the phonetic quantity change. 2. Stem-end changes. Stem can appear either as a lemmatic stem or an inflection stem; stem variants are differentiated by changes enfolding the final sounds ( e.g. 'aadel-aadli, j'alg \~bot\-j'alga, sipelgas" \ant\-sipelga). 3. Secondary changes. These changes are conditioned by the certain context arising after either the stem-end or the stemgrade change (e.g. k'uppel \dome\ --~ * k'uppli --+ k'upli). About 20 % of stems stay changeless, mostly take place the stem-end or stem-grade changes or both at the same time. Formally the recognition of the stem change rules can be reduced to the classification task with string pairs as the objects to classify and possible rules of stem changes as the classes. System has to create class descriptions from the 'available' data: characters and their belongness to the sound classes. The important demand to the classification system is the linguistical
منابع مشابه
Iterative learning identification and control for dynamic systems described by NARMAX model
A new iterative learning controller is proposed for a general unknown discrete time-varying nonlinear non-affine system represented by NARMAX (Nonlinear Autoregressive Moving Average with eXogenous inputs) model. The proposed controller is composed of an iterative learning neural identifier and an iterative learning controller. Iterative learning control and iterative learning identification ar...
متن کاملComprehensive Analysis of Dense Point Cloud Filtering Algorithm for Eliminating Non-Ground Features
Point cloud and LiDAR Filtering is removing non-ground features from digital surface model (DSM) and reaching the bare earth and DTM extraction. Various methods have been proposed by different researchers to distinguish between ground and non- ground in points cloud and LiDAR data. Most fully automated methods have a common disadvantage, and they are only effective for a particular type of surf...
متن کاملطبقه بندی و شناسایی رخسارههای زمینشناسی با استفاده از دادههای لرزه نگاری و شبکههای عصبی رقابتی
Geological facies interpretation is essential for reservoir studying. The method of classification and identification seismic traces is a powerful approach for geological facies classification and distinction. Use of neural networks as classifiers is increasing in different sciences like seismic. They are computer efficient and ideal for patterns identification. They can simply learn new algori...
متن کاملO-3: Identification and Characterization of Repopulating Spermatogonial Stem Cells from The Adult Human Testis
Background: This study was conducted to identify and characterize repopulating spermatogonial stem cells (SSCs) in the adult human testes. Materials and Methods: Testes biopsies from obstructive azoospermic patients and normal segments of human testicular tissue were used. Flow cytometry, real time PCR and immunohistochemical analysis were performed. Purified human spermatogonia were transplant...
متن کاملCombining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)
Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...
متن کاملModeling the impact of learning environment and professor-student rapport on professional identification of medical students
Abstract: Introduction: Professional identification as a social process in order to define the individual and the professional community of the individual is affected by different environmental, individual and institutional factors. The aim of this study was to identify a model for examining the role of learning environment and professor-student rapport in predicting the professional iden...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996